feat(core): add opt-in periodic ping for connection health monitoring by travisbreaks · Pull Request #1717 · modelcontextprotocol/typescript-sdk

travisbreaks · 2026-03-19T18:44:40Z

Summary

Adds pingIntervalMs option to ProtocolOptions that enables automatic periodic pings to verify the remote side is still responsive
Implemented at the Protocol level so both Client and Server benefit
Client starts periodic pings after successful initialization; Server starts after receiving the initialized notification
Disabled by default (opt-in). Ping failures are reported via onerror without stopping the timer
Timer uses unref() so it does not prevent clean Node.js process exit

Per the MCP specification: "Implementations SHOULD periodically issue pings to detect connection health" with "configurable frequency."

Usage

const client = new Client(
    { name: 'my-client', version: '1.0.0' },
    { pingIntervalMs: 30_000 } // ping every 30 seconds
);

client.onerror = (error) => {
    console.warn('Connection issue:', error.message);
};

await client.connect(transport);
// Periodic pings start automatically after initialization

Key design decisions

Protocol-level implementation: both Client and Server can use periodic pings, since the spec says "implementations" (not just clients)
Opt-in, not opt-out: existing behavior is unchanged; you must explicitly set pingIntervalMs to enable
Ping timeout equals the interval: if a ping takes longer than one interval to respond, it fails with a timeout error rather than stacking up
unref() on the timer: the periodic ping does not keep the Node.js process alive if all other work is done

Relationship to other PRs

This complements PR #1712 (host process watchdog for stdio orphan detection). That PR detects when the host process is gone at the OS level; this PR detects when the remote MCP peer is unresponsive at the protocol level.

Fixes #1000

Test plan

Changeset

@modelcontextprotocol/core: minor, @modelcontextprotocol/client: minor, @modelcontextprotocol/server: minor

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

changeset-bot · 2026-03-19T18:44:57Z

🦋 Changeset detected

Latest commit: 5116b16

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages

Name	Type
@modelcontextprotocol/core	Minor
@modelcontextprotocol/client	Minor
@modelcontextprotocol/server	Minor
@modelcontextprotocol/node	Major
@modelcontextprotocol/express	Major
@modelcontextprotocol/fastify	Major
@modelcontextprotocol/hono	Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2026-03-19T18:45:59Z

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/@modelcontextprotocol/client@1717

@modelcontextprotocol/server

npm i https://pkg.pr.new/@modelcontextprotocol/server@1717

@modelcontextprotocol/express

npm i https://pkg.pr.new/@modelcontextprotocol/express@1717

@modelcontextprotocol/fastify

npm i https://pkg.pr.new/@modelcontextprotocol/fastify@1717

@modelcontextprotocol/hono

npm i https://pkg.pr.new/@modelcontextprotocol/hono@1717

@modelcontextprotocol/node

npm i https://pkg.pr.new/@modelcontextprotocol/node@1717

commit: 5116b16

felixweinberger · 2026-03-25T21:15:46Z

@claude review

Implements the missing periodic ping functionality specified in issue modelcontextprotocol#1000. Per the MCP specification, implementations SHOULD periodically issue pings to detect connection health, with configurable frequency. Changes: - Add `pingIntervalMs` option to `ProtocolOptions` (disabled by default) - Implement `startPeriodicPing()` and `stopPeriodicPing()` in Protocol - Client starts periodic ping after successful initialization - Server starts periodic ping after receiving initialized notification - Timer uses `unref()` so it does not prevent clean process exit - Ping failures are reported via `onerror` without stopping the timer - Timer is automatically cleaned up on close or unexpected disconnect Fixes modelcontextprotocol#1000 Co-authored-by: Br1an67 <932039080@qq.com>

packages/server/src/server/server.ts

Mirrors the Go SDK pattern where the initialized notification handler is a named method rather than an inline closure.

felixweinberger · 2026-03-26T15:38:37Z

@claude review

packages/client/src/client/client.ts

The reconnection path (when transport.sessionId is set) returns early, skipping the startPeriodicPing() call that runs after normal initialization. Since _onclose() clears the ping timer during disconnection, connection health monitoring silently stops working after any reconnect. Adding startPeriodicPing() before the early return ensures pings resume on reconnection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

felixweinberger · 2026-04-02T16:10:50Z

@claude review

claude · 2026-04-02T16:21:38Z

packages/core/src/shared/protocol.ts

    async close(): Promise<void> {
+        this.stopPeriodicPing();
        await this._transport?.close();
    }


🔴 When close() is called while a periodic ping is in-flight, _onclose() rejects all pending response handlers with SdkError(ConnectionClosed), causing the ping catch block to fire this._onerror() with that error. Users monitoring onerror for connection health will receive spurious Connection closed errors during normal, intentional shutdowns that are indistinguishable from genuine connection failures. Fix: add a _closing boolean flag set before closing the transport, and check it in the ping catch block to suppress the error.

Extended reasoning...

The Race Condition

The bug lives in startPeriodicPing() at packages/core/src/shared/protocol.ts around lines 778-804. The setInterval callback is async, so once it fires and awaits _requestWithSchema(), JavaScript cannot cancel it. Clearing the interval only prevents future firings; the already-started async function continues running. This is the core of the race.

Exact Code Path

The setInterval fires. The async callback begins executing and calls _requestWithSchema({ method: 'ping' }, ...), which registers a response handler in _responseHandlers and suspends at the await.

Before the server responds, the caller invokes close().

close() calls stopPeriodicPing(), which does clearInterval(this._pingTimer). The in-flight async callback has already started; clearing the interval only prevents future firings.

close() then calls this._transport?.close(), which triggers the transport onclose callback.

That callback invokes _onclose(), which iterates over all entries in _responseHandlers and rejects each one with new SdkError(SdkErrorCode.ConnectionClosed, 'Connection closed').

The pending _requestWithSchema promise now rejects with the ConnectionClosed error.

The catch block in startPeriodicPing executes: since SdkError extends Error, this._onerror(error) fires.

The user's onerror callback receives a SdkError(ConnectionClosed) during a normal, user-initiated shutdown.

Why Existing Code Does Not Prevent It

The close() method does call stopPeriodicPing() before this._transport?.close(), which looks like it should protect against this. But stopPeriodicPing() only prevents new timer firings. It cannot retroactively cancel an async function that has already started and is awaiting a promise. There is no mechanism in the ping catch block to distinguish intentional close from a genuine connection failure.

Impact

Any consumer that sets onerror to monitor connection health (the primary intended use case per the PR description) will see false positive ConnectionClosed errors every time close() is called while a ping happens to be in-flight. Since pings fire on a regular interval, the race is probabilistic but reliably reproducible at scale or in tests. More critically, these errors are indistinguishable from a real remote-side failure, so automated reconnection logic or alerting built on onerror will misfire.

How to Fix

Add a private _closing boolean field to Protocol, initialized to false. In close(), set this._closing = true before calling stopPeriodicPing() and this._transport?.close(). In the ping catch block, guard the onerror call:

} catch (error) { if (!this._closing) { this._onerror(error instanceof Error ? error : new Error('Periodic ping failed: ' + String(error))); } }

Reset _closing to false in _onclose() after cleanup so that a reconnected protocol instance works correctly. The _closing flag cleanly suppresses the error regardless of what exact error code the transport produces on shutdown.

Step-by-Step Proof

T=0 setInterval fires; pingCallback() begins executing T=0 _requestWithSchema registers handler H in _responseHandlers, yields at await T=1 close() is called by user T=1 stopPeriodicPing() clears the timer -- in-flight pingCallback is unaffected T=1 this._transport.close() called T=1 transport triggers onclose -> _onclose() runs T=1 _onclose(): for each handler in responseHandlers -> handler(SdkError(ConnectionClosed)) T=1 Handler H rejects; pingCallback resumes in catch block T=1 catch(error): this._onerror(error) -- SPURIOUS ERROR DELIVERED TO USER T=1 user onerror fires with 'Connection closed' during intentional shutdown

claude · 2026-04-02T16:21:38Z

packages/core/src/shared/protocol.ts

+        this._pingTimer = setInterval(async () => {
+            try {
+                await this._requestWithSchema({ method: 'ping' }, EmptyResultSchema, {
+                    timeout: this._pingIntervalMs
+                });
+            } catch (error) {
+                this._onerror(error instanceof Error ? error : new Error(`Periodic ping failed: ${String(error)}`));
+            }
+        }, this._pingIntervalMs);


🟡 The setInterval-based ping loop can briefly have 2 concurrent pings in-flight simultaneously. Since the ping timeout is also set to _pingIntervalMs, the timeout for ping N and the start of ping N+1 are both scheduled for the same wall-clock time — their ordering in the JS event loop is non-deterministic. Using setTimeout with rescheduling after each ping completes would strictly enforce the documented one-at-a-time behavior.

Extended reasoning...

What the bug is

In startPeriodicPing() (protocol.ts:772-780), setInterval fires every _pingIntervalMs regardless of whether the previous async callback has finished. The ping timeout passed to _requestWithSchema is also this._pingIntervalMs. This means the internal setTimeout that will reject ping N with a timeout error and the next setInterval tick that fires ping N+1 are both scheduled for the same wall-clock time (T + pingIntervalMs from ping N start). Which fires first is non-deterministic: it depends on the JS engine timer queue ordering, which is not specified.

The specific code path

The relevant code:

this._pingTimer = setInterval(async () => { try { await this._requestWithSchema({ method: 'ping' }, EmptyResultSchema, { timeout: this._pingIntervalMs // same value as the interval }); } catch (error) { this._onerror(...); } }, this._pingIntervalMs);

Because setInterval does not await the async callback, it fires the next tick at T + pingIntervalMs unconditionally. The request inside the callback has its own setTimeout-based timeout also set to pingIntervalMs. Both macrotasks are enqueued for approximately the same wall-clock time.

Why existing code does not prevent it

The idempotency guard only prevents duplicate timers from multiple startPeriodicPing() calls. It does nothing to serialize individual ping invocations. Once the interval is running, each tick creates a new promise/request that executes concurrently with any still-in-flight ping.

Addressing the refutation and impact

The refutation correctly notes that the overlap is bounded: because the timeout equals the interval, each lingering ping will be canceled at approximately the same time the next one starts. The worst case is 2 concurrent pings, not N. In practice, on a well-functioning connection, pings respond well within the interval and there is no overlap. Only when a ping is slow (response time approaching pingIntervalMs) does the race condition manifest, and even then it is transient and bounded.

However, the PR description explicitly claims: "Ping timeout equals the interval: if a ping takes longer than one interval to respond, it fails with a timeout error rather than stacking up." This is imprecise. Due to timer imprecision, the new ping can start before the old one is cleaned up, meaning they do briefly stack up (2 at once). The stated guarantee is not actually enforced by the implementation.

Step-by-step proof

T=0: setInterval fires, ping Better validation of data coming over the wire #1 is sent. A setTimeout(reject, pingIntervalMs) is registered inside _requestWithSchema.

T=pingIntervalMs: Both the interval tick (ping Race condition between transport start and server connection #2) and the timeout for ping Better validation of data coming over the wire #1 are queued as macrotasks. Two timers set for the same target time have implementation-defined ordering.

If the interval fires first: ping Race condition between transport start and server connection #2 is sent and added to _responseHandlers while ping Better validation of data coming over the wire #1's handler is still present. Both are in-flight simultaneously for a brief moment before ping Better validation of data coming over the wire #1's timeout fires.

If the timeout fires first: ping Better validation of data coming over the wire #1 is cleaned up, then ping Race condition between transport start and server connection #2 is sent — matching the documented behavior.

How to fix it

Replace setInterval with a self-rescheduling setTimeout pattern that only schedules the next ping after the current one completes (success or failure). This strictly serializes pings and matches the documented one-at-a-time intent, at the cost of the interval becoming pingIntervalMs + RTT rather than exactly pingIntervalMs — an acceptable trade-off for health monitoring.

claude · 2026-04-02T16:21:38Z

packages/core/src/shared/protocol.ts

+    /**
+     * Starts sending periodic ping requests at the configured interval.
+     * Pings are used to verify that the remote side is still responsive.
+     * Failures are reported via the {@linkcode onerror} callback but do not
+     * stop the timer; pings continue until the connection is closed.
+     *
+     * This is called automatically at the end of {@linkcode connect} when
+     * `pingIntervalMs` is set. Subclasses that override `connect()` and
+     * perform additional initialization (e.g., the MCP handshake) may call
+     * this method after their initialization is complete instead.
+     *
+     * Has no effect if periodic ping is already running or if no interval
+     * is configured.
+     */


🟡 The JSDoc for startPeriodicPing() claims "This is called automatically at the end of connect() when pingIntervalMs is set," but Protocol.connect() ends with await this._transport.start() and never calls startPeriodicPing(). Any custom Protocol subclass that does not override connect() will silently never start periodic pings — update the JSDoc to accurately describe that only Client and Server call this after their respective initialization steps.

Extended reasoning...

What the bug is

The JSDoc comment on startPeriodicPing() (protocol.ts lines 759-762) contains a false statement: "This is called automatically at the end of connect() when pingIntervalMs is set." This implies the base Protocol.connect() is responsible for starting the ping timer when the option is configured.

The specific code path

Looking at Protocol.connect(), the method sets up transport callbacks and ends with await this._transport.start(). There is no call to startPeriodicPing() anywhere in the base implementation. The actual callers are: (1) Client.connect() calls this.startPeriodicPing() after the full MCP initialization handshake completes (and also on the reconnection fast-path); (2) Server._handleInitialized() calls this.startPeriodicPing() when the notifications/initialized notification is received.

Why existing code does not prevent the confusion

The second sentence of the JSDoc partially acknowledges the real behavior: "Subclasses that override connect() and perform additional initialization may call this method after their initialization is complete instead." However, this reads as an opt-out from the stated automatic behavior, not a correction of it. The false first sentence still stands.

Impact

Any developer implementing a custom Protocol subclass who passes pingIntervalMs and does not override connect() will get no periodic pings, despite the documentation promising automatic invocation. The ping timer simply never starts. This is a silent failure — no error is thrown, and _pingIntervalMs is set, so there is no obvious indication that something is wrong.

Step-by-step proof

Create a class extending Protocol without overriding connect()

Instantiate with new MyProtocol({ pingIntervalMs: 30_000 })

Call await myProtocol.connect(transport) — this runs Protocol.connect(), which calls transport.start() and returns

startPeriodicPing() is never invoked; _pingTimer remains undefined

No pings are ever sent, despite _pingIntervalMs being set to 30,000

Fix

Update the JSDoc to accurately state that Client and Server are the actual callers, and that custom subclasses must call this method explicitly after their own initialization is complete.

Regarding the duplicate refutation

One verifier flagged this as a duplicate of bug_004. Whether or not another report covers the same issue, the bug is independently confirmed and real. The JSDoc is factually wrong regardless of duplication — the duplication concern is a process question, not a technical refutation.

travisbreaks requested a review from a team as a code owner March 19, 2026 18:44

rcdailey mentioned this pull request Mar 21, 2026

feat(server): add keepAliveInterval for standalone GET SSE stream #1726

Open

felixweinberger force-pushed the feat/periodic-ping branch from 51a9eae to b41b17e Compare March 25, 2026 21:14

felixweinberger force-pushed the feat/periodic-ping branch from b41b17e to 9e1b5da Compare March 25, 2026 21:18

felixweinberger reviewed Mar 26, 2026

View reviewed changes

packages/server/src/server/server.ts Outdated Show resolved Hide resolved

refactor: extract _handleInitialized method for clearer lifecycle hook

810c72e

Mirrors the Go SDK pattern where the initialized notification handler is a named method rather than an inline closure.

felixweinberger mentioned this pull request Mar 26, 2026

feat(client): add periodic ping support for connection health monitoring #1645

Closed

claude bot reviewed Mar 26, 2026

View reviewed changes

packages/client/src/client/client.ts Show resolved Hide resolved

travisbreaks and others added 2 commits April 2, 2026 00:32

Merge branch 'main' into feat/periodic-ping

5116b16

claude bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add opt-in periodic ping for connection health monitoring#1717

feat(core): add opt-in periodic ping for connection health monitoring#1717
travisbreaks wants to merge 4 commits intomodelcontextprotocol:mainfrom
travisbreaks:feat/periodic-ping

travisbreaks commented Mar 19, 2026

Uh oh!

changeset-bot bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

felixweinberger commented Mar 25, 2026

Uh oh!

Uh oh!

felixweinberger commented Mar 26, 2026

Uh oh!

Uh oh!

felixweinberger commented Apr 2, 2026

Uh oh!

claude bot Apr 2, 2026

Uh oh!

claude bot Apr 2, 2026

Uh oh!

claude bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

travisbreaks commented Mar 19, 2026

Summary

Usage

Key design decisions

Relationship to other PRs

Test plan

Changeset

Uh oh!

changeset-bot bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixweinberger commented Mar 25, 2026

Uh oh!

Uh oh!

felixweinberger commented Mar 26, 2026

Uh oh!

Uh oh!

felixweinberger commented Apr 2, 2026

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

The Race Condition

Exact Code Path

Why Existing Code Does Not Prevent It

Impact

How to Fix

Step-by-Step Proof

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Why existing code does not prevent it

Addressing the refutation and impact

Step-by-step proof

How to fix it

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

What the bug is

The specific code path

Why existing code does not prevent the confusion

Impact

Step-by-step proof

Fix

Regarding the duplicate refutation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Mar 19, 2026 •

edited

Loading

pkg-pr-new bot commented Mar 19, 2026 •

edited

Loading